A ResNet-Based Audio-Visual Fusion Model for Piano Skill Evaluation
نویسندگان
چکیده
With the rise in piano teaching recent years, many people have joined ranks of learners. However, high cost traditional manual instruction and exclusive one-on-one model made learning an extravagant endeavor. Most existing approaches, based on audio modality, aim to evaluate players’ skills. Unfortunately, these methods overlook information contained videos, resulting a one-sided simplistic evaluation player’s More recently, multimodal-based been proposed assess skill level players by using both video information. multimodal approaches use shallow networks extract features, which limits their ability complex spatio-temporal time-frequency characteristics from performances. Furthermore, fingering pitch-rhythm performance is embedded within respectively. Therefore, we propose ResNet-based audio-visual fusion that able visual features finger movement track auditory including pitch rhythm. The joint are then obtained through feature technique capturing correlation complementary between audio, enabling comprehensive accurate level. Moreover, can temporal frequency Firstly, ResNet18-3D used as backbone network for our branch, allowing us data. Then, utilize ResNet18-2D aural branch extracted fused with generating final evaluation. experimental results PISA dataset show model, validation accuracy 70.80% average training time 74.02 s, outperforms baseline terms operational efficiency. explore impact different layers ResNet model’s performance. In general, achieves optimal when ratio balanced. best achieved 68.70% differs significantly.
منابع مشابه
ResNet and Model Fusion for Automatic Spoofing Detection
Speaker verification systems have achieved great progress in recent years. Unfortunately, they are still highly prone to different kinds of spoofing attacks such as speech synthesis, voice conversion, and fake audio recordings etc. Inspired by the success of ResNet in image recognition, we investigated the effectiveness of using ResNet for automatic spoofing detection. Experimental results on t...
متن کاملWider or Deeper: Revisiting the ResNet Model for Visual Recognition
The trend towards increasingly deep neural networks has been driven by a general observation that increasing depth increases the performance of a network. Recently, however, evidence has been amassing that simply increasing depth may not be the best way to increase performance, particularly given other limitations. Investigations into deep residual networks have also suggested that they may not...
متن کاملNoise-based audio-visual fusion for robust speech recognition
A major goal of current speech recognition research is to improve the robustness of recognition systems used in noisy environments. Recent strides in computing technology have allowed consideration of systems that use visual information to augment the decision capability of the recognizer, allowing superior performance in these difficult environments. A crucial area of research in audiovisual s...
متن کاملFusion for Audio-Visual Laughter Detection
Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. ...
متن کاملBayesian Robustification for Audio Visual Fusion
We discuss the problem of catastrophic fusion in multimodal recognition systems. This problem arises in systems that need to fuse different channels in non-stationary environments. Practice shows that when recognition modules within each modality are tested in contexts inconsistent with their assumptions, their influence on the fused product tends to increase, with catastrophic results. We expl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2023
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app13137431